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The  theory  of  large  deviations  is  applied  to  the  sr  .dy  of  the  asymptotic 

properties  of  the  stochastic  approximation  algorithms  (1.1)  and  (1.2).  The 

method  provides  a  useful  alternative  to  the  currently  used  technique  of  obtaining 

rate  of  convergence  results  bv  studying  the  sequence  {(X  -u)/v/a~„>  (for  (1.1)), 

n  n 

where  r  is  a  'stable1  point  of  the  algorithm.  Let  G  be  a  bounded  neighborhood 
of  ~ ,  which  is  in  the  domain  of  attraction  of  o  for  the  'limit  ODE'.  The  process 
xY)  is  defined  as  a  'natural  interpolation'  of  with  xQ(0)  =  X^ , 

and  interrelation  intervals  {a.,j>n}.  Define  =  min  {t :  xn  ( t)  £  G;  .  Then  it 
is  shewn  (among  other  things)  that  P  {t!I  <  T}  -  exp-n^V,  where  q  depends  on 

X  u  — 

and  V  depends  on  the  b(*)»  cov  ,  and  G.  Such  estimates  imply 
that  the  asvmptotic  behavior  is  much  better  than  suggested  by  the  ’local  lineari¬ 
zation  methods',  and  they  yield  much  new  insight  into  the  asymptotic  behavior. 

The  technique  is  applicable  to  related  problems  in  the  asymptotic  analysis  of 
recursive  algorithms,  and  requires  weaker  conditions  on  the  dynamics  than  do  the 
■  ' 1 inear iza t ion  methods'.  The  necessary  basic  background  is  provided  and  the 
I  °Pti  r.nl  control  problems  associated  with  getting  the  V  above  are  derived. 
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Abstract 

The  theory  of  large  deviations  is  applied  to  the  study  of  the  asymptotic 

properties  of  the  stochastic  approximation  algorithms* (1. 1)  and  (1.2).  The 

method  provides  a  useful  alternative  to  the  currently  used  technique  of  obtaining 

rate  of  convergence  results^by  studying  the  sequence  {(X  -6)/ya~}  (for  (1.1)), 

•  n  n 

r 

where  0  is  a  Stable1  point  of  the  algorithm.  Let  G  be  a  bounded  neighborhood 
of  0,  which  is  in  the  domain  of  attraction  of  9  for  the  ’limit  ODE1.  The  process 
xn(*)  is  defined  as  a  rnatural  interpolation ’  of  {X^.,j^n}  with  xn(0)  =  , 

and  interpolation  intervals  {a.,j>n}.  Define  T*!  =  min(t :  xn(t)€  G)  .  Then  it 

j  —  G 

is  shown  (among  other  things)  that  P  {t!!  <  T}  -  exp-n^V,  where  q  depends  on 
{a^jC^},  and  V  depends  on  the  b(*)>cov  and  G.  Such  estimates  imply 

that  the  asymptotic  behavior  is  much  better  than  suggested  by  the  ’local  lineari¬ 
zation  methods’,  and  they  yield  much  new  insight  into  the  asymptotic  behavior. 

The  technique  is  applicable  to  related  problems  in  the  asymptotic  analysis  of 
recursive  algorithms,  and  requires  weaker  conditions  on  the  dynamics  than  do  the 
’linearization  methods’.  The  necessary  basic  background  is  provided  and  the 
optimal  control  problems  associated  with  getting  the  V  above  are  derived. 


1 


1 
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1.  Introduction 


The  paper  deals  with  a  useful  and  heretofore  unexploited  approach  to 
the  asymptotic  behavior  of  stochastic  approximation  (SA)  like  algorithms  of 
the  form 

(1.1)  X -  =  X  +  a  b(X  )  +  a  £  ,  a  =  (n+l)“p  , 

n+i  n  n  n  n  n  n 

or  of  the  'Kief er-Wolf owitz '  form 

(1.2)  X  =  Xo  +  a  b(X  )  +  a£/c  ,  a  =  (n+l)'P  ,  c  =  (n+l)"\xn  £  Rr, 

n+1  nnnnnn  n  n  ’  n 

where  0  <  -  <  1  and  0  <  Y  <  P/2.  To  avoid  excess  notation,  let  (4  } 

—  n 

be  mutually  independent  and  identically  distributed.  The  noise  sequence  (4^) 

is  mean  zero  and  Gaussian,  with  covariance  matrix  R  >_  0 .  As  seen  below,  it 
is  hard  to  do  some  of  the  required  calculations  in  the  non-Gaussian  case, 
although  the  bdsic  theory  is  much  more  widely  applicable.  Despite  the  restric¬ 
tion  to  the  Gaussian  case,  the  results  shed  considerable  new  light  on  the 
asymptotic  behavior.  One  would  expect  that  the  order  of  the  obtained  estimates 
would  hold  under  much  weaker  conditions.  Of  particular  interest  are  estimates 
(as  a  function  of  n)  of  the  probability  that  the  'tail'  of  the  SA  sequence 

(X  ,i  >  n}  escapes  from  a  neighborhood  of  a  ’stable1  point  of  the  algorithm, 
m  — 

By  a  ’stable  point1  we  mean  a  point  (J  at  which  x  =  b(x)  is  asymptotically 
stable.  Under  our  conditions,  if  is  in  a  small  neighborhood  of  0 
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often  enough,  then  it  converges  to  0  w.p.l.  We  are  not  interested  in  the 

w.p.l.  convergence,  only  in  the  'rate  of  convergence1  or  in  the  behavior  of 

(Xn)  in  a  neighborhood  of  0  So  we  simply  assume  that  0  w.p.l. 

The  estimates  in  the  sequel  imply  that  the  asymptotic  behavior  is  much  better 

than  one  would  expect  from  using  the  usual  limit  theory,  which  is  based  on 

the  asymptotic  normality  of  the  sequence  of  suitably  normalized  errors  (say 

of  (X  -0)//a~")  for  (1.1).  The  classical  theory  is  much  more  'local*  about 
n  n 

0,  and  does  not  exploit  as  fully  as  possible  the  stabilizing  properties  of 
the  ODE  x  =  b(x)  in  a  neighborhood  of  0. 

An  additional  advantage  of  our  approach  is  that  b(*)  is  not  required  to 
have  continuous  derivatives,  as  the  classical  theory  requires.  It  need  only 
be  Lipschitz  continuous.  Thus  one  can  treat  problems  where  (e.g.)  b(’)  is 

obtained  from  a  min-max  operation,  or  where  (scalar  case)  the  slope  of 
b(*)  is  discontinuous  at  0.  E.g.  b(x)  =  -k^(x-0)  for  x  >  t*  and 
b(x)  «  -k^(x-0)  for  x  <  0,  where  ^  ’  an<*  >  ^ # 

Results  of  simulations  support  the  idea  that  the  iterates  spend 
(asymptotically)  almost  all  the  time  on  the  part  with  the  smaller  slope, 
and  this  behavior  is  implied  by  our  results.  Also,  simple  constraints  can 
read  My  be  handled.  For  example  let  fX^)  be  confined  to  |a,b),  where 
-cu<a^b<a>,  and  b(x)  >  0  on  [a,b|.  Then  -►  b,  and  we  can 

obtain  estimates  of  the  behavior  of  the  sequence  near  b  (e.g.,  probability 
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of  escape  from  a  small  neighborhood  of  b) .  This  cannot  be  done  with  the 
classical  rate  of  convergence  theory  for  SA’s. 

The  particular  problem  of  interest  will  now  be  described.  Let  G 

denote  a  bounded  open  set  which  is  in  the  domain  of  attraction  of  0  (for 
x  =  b(x))  and  whose  boundary  is  piecewise  differentiable.  Roughly,  we  are 
interested  in  estimates  of  the  type  P {Xn+m?  G,  some  m  >  1  j  neighborhood  of  6} 


and  we  now  make  this  precise.  Define  t  -  )  a.  and  m(t)  =  max{n:t  <  t } . 

n  0  1  n" 

m( t  +  t) 
n 

Then  m(t  )  =  n  and  £  a./t  1  for  each  t  >  0.  Both  t  and 
n  l  n 

n 

m(t)  depend  on  p.  For  each  n,  define  the  process  xn(*)  on  [0 

as  follows.  It  is  piecewise  linear,  with  initial  condition  *  xn(0)  and 

breakpoints  {0.  tn+1-tn,  tn+2-tn,  ...  }  =  {O.a^a^+a^  and 

Xn(t  -t  )  =  X  .  Thus  xn(’)  'starts'  at  the  n^  iteration.  Such  an  inter- 
m  n  m 


polation  has  been  very  useful  in  the  analysis  of  the  asymptotic  properties  of 


{X^} ,  and  is  the  key  to  the  so-called  f0DE  method1  [1],[2].  Define 

I™  =  min{t:xn(t)  £  G}.  If  X  -►  0  w.p.l  (or  even  ?weaklvf),  then  Ex^! 

G  n  -  Li 

is  not  necessarily  defined.  But  P  {t™  <  T}  is  of  considerable  interest 

x  G  — 

as  a  criterion  of  performance  and  stability  of  the  algorithm,  where  T  is  any 
positive  number.  Here  P^  denotes  the  probability,  conditioned  on  the 

event  that  =  x  €  G.  The  dependence  on  p  and  Y  and  the  structure 

of  b(*)  is  of  particular  interest. 
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Since  the  probability  Px ^tq  —  tends  to  zero  as  n  4  w  ,  it  is 
natural  to  look  for  a  normalizing  sequence.  In  particular,  we  seek  a  sequence 
A^  -*  0  such  that  the  limit  in  (1.3)  exists,  where  0  <  V  <  00  . 

(1.3)  lira  A  log  P  {t"  <•  T}  =  -V  . 

n  n  x  0  — 

Under  ouite  broad  conditions,  (1.3)  is  continuous  in  x  in  a  neighborhood 
of  9. 

Let  C^[0,T]  denote  the  sbace  of  Rr  valued  continuous  functions  on 

[Q,T],  with  initial  value  x,  and  with  the  topology  of  uniform  convergence. 

Let  A  c  C  [0,T].  Then  estimates  for  lim  A  log P  (xn(*)€  A}  are  also 

provided.  We  restrict  attention  to  the  Gaussian  case,  since  it  is  hard  to 

obtain  the  proper  normalizing  sequences  {A^}  in  general  ,  and  the  Gaussian 
case  is  quite  interesting  in  itself.  (The  results  in  the  sequel  also  indicate 
what  is  needed  in  the  more  general  cases.)  Very  similiar  reasons  require  the 
use  of  the  T small  white  noise  model*  in  singular  perturbation  studies.  But, 
despite  thi*  restriction,  singular  perturbation  theory  has  achieved  some 

significant  results  [3]f[4].  Results  on  the  robustness  of  the  estimates  with 
respect  to  the  noise  statistics  appear  in  [11]. 

Estimates  such  as  (1.3)  cannot  be  obtained  from  the  classical  rate  of 
convergence  theory  for  SA*s.  In  order  to  put  our  results  in  perspective, 
some  of  the  classical  theory  is  outlined  briefly  in  Section  2.  The  theory 


of  large  deviations  is  the  appropriate  vehicle  for  getting  (1.3).  The 

necessary  background  is  provided  in  Section  3.  Our  results  involve  a 

modification  of  a  basic  theorem  of  Freidlin  (Theorem  2.1  in  [5]),  and  in 
Section  3,  his  result  is  stated,  together  with  a  rough  idea  of  the  proof,  in 

order  to  facilitate  its  modification  for  our  needs.  In  Section  4,  the  basic 

large  deviations  theorem  for  SArs  is  stated,  as  are  the  modifications  to 

Freidlins  proof  which  are  needed  to  get  the  extensions  for  our  cases.  In 

Section  5,  the  basic  theorem  is  specialized  to  the  ’escape  time  problem*,  and 

the  {/  }  are  calculated  in  Section  6.  The  VA  are  obtained  from  the 

solution  to  a  variational  problem,  and  this  is  discussed  in  Sections  4  and  5. 

The  basic  result  is  that  X  =  0(a  )  for  (1.1)  and  X  =  0(a  /c2) 

n  n  n  n  n 

for  (1.2).  Also  (for  x  near  0) 

Px^TG  -  '  exP-v0nP  (for  (1.1)) 

(1.4) 

Px^TG  -  ~  exP-vpnP_2Y  •  (for  (1.2)) 

The  V  is  constant  foL  p€  (0,1),  and  their  values  appear  in  Section  6. 

0 

The  estimates  (1.4)  imply  that  the  asymptotic  behavior  is  much  better  than 
one  would  expect  from  the  classical  rate  of  convergence  theory.  Solving  for 
the  V  involves  solving  a  variational  or  optimal  control  problem,  as  will 


be  seen.  But,  the  qualitative  results  such  as  (1.4)  are  of  interest  even  if 


the  exact  values  of  the  V  are  not  known. 

P 

The  theory  of  large  deviations  is  of  considerable  potential  use  in  the 
study  of  the  asymptotic  behavior  of  recursive  algorithms.  It  is  of  potential 
use,  where  one  wants  to  avoid  the  ’local  linearization*  methods  otherwise 
used  to  study  the  asymptotic  behavior,  or  to  take  greater  advantage  of  the 
stability  of  the  ’limit  ODE’.  Also,  see  [6], [7]  where  it  is  used  to  obtain 

estimates  of  the  probability  of  breakdown  of  an  ALOAH  type  communications 
network.  The  application  of  the  Theory  of  large  deviations  to  the  SA  problem 
involves  some  new  considerations .  The  norming  sequences  are  not  standard  in 
the  large  deviations  literature,  and  the  ’ Lagrangians ’  L(x,S,s)  can  depend 
on  time  here.  The  distinct  differences  between  the  cases  P  =  1  and  P  <  1 
are  not  at  all  obvious. 
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2.  Classical  Rates  of  Convergence  for  (1.1) 


In  order  to  put  the  results  of  this  paper  into  perspective  with  the 

other  main  method  of  studying  the  behavior  of  {x  }  near  b ,  some 

n 

classical  results  are  reviewed  here.  Our  attention  is  confined  to  (1.1). 

Let  X  ->  dw.p.l  and  define  U  =  (n+l)^^(X  -6),  and  let  b(*)  be 
n  n  n 

continuously  differentiable,  with  b(b)  =  0.  Drop  the  i.i.d.  assumption 

oo 


on  {4  } ,  but  let  it  be  stationary  and  define 
n  ’ 

the  sum  is  assumed  to  be  absolutely  convergent. 


R 


Then  for  (1.1), 


where 


(2.1) 


n+1 


[1  +  a  (b  (0)+ - - — )  +  0(l/n)]U 

n  X  2(n  +  l)W 


+  (n+l)*"^24  + 

n 


0(l/nK  . 
n 


Define  Un(*)  as  xn(*)  was  defined,  but  using  {U^ ,  j  n }  instead  of 
IX.,  j^n).  For  C  =  1(P  <  1,  resp.)  let  I/2-f  b  (0)  (b  (0),  resp.)  have 

its  eigenvalues  in  the  open  left  half  plane.  (The  matrix  is  then  said  to 
be  stable . )  Then,  under  quite  broad  conditions  [8],  (l’n(*)}  converges 

weakly  to  the  stationary  solution  of  the  Ito  equations 


(2.2a)  dU  =  (I/2+b  (0))Udt  +  R1/2dw,  P  =  1, 

(2.2b)  dU  =  b  (o)Udt  +  R1/2dw,  (•  1, 

X 

where  w(«)  is  a  standard  Wiener  process. 
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In  particular,  the  sequence  {U  }  converges  in  distribution  to  the 

H  P  C 

stationary  random  variable  U  of  (2.2),  where  U  -  N(0,i.  )  and 


[exp  t(I/2  +  bx<e))]  R  [exp  t (1/2  +  bx(fc))]dt , 


l  =  f 

L  A  l 


texp  tb  (u)  ]  R  exp  tb  (0)dt  ,  P  <  1. 

J  n  x  x 


Note  the  differences  between  the  cases  P  =  1  and  P  <  1.  In  particular, 

the  more  stringent  stability  requirement  on  b^(G),  when  P  =  1.  The 

limit  (1.3)  holds  only  under  stability  of  x  =  b(x),  so  the  more  stringent 
requirement  on  b  (*J )  is  not  needed.  In  fact,  (1.3)  can  be  obtained  even 

if  b^( 0)  has  a  zero  eigenvalue,  provided  that  x  -  b(x)  is  stable  at  c 


An  analysis  of  (2.2)  can  provide  much  useful  information  as  the  asymp¬ 
totic  behavior  of  ;X  }.  But  it  cannot  help  us  with  the  large  deviation 

n 

estimate  (1.3),  where  the  set  G  is  fixed.  This  is  partly  because 
-P/2 

(X  -0)  ^  (n+1)  U^,  which  goes  to  zero  in  probability  as  n  00 .  Also 

the  validity  of  (2.2)  requires  continuity  of  b  ( •)  at  x  =  6.  Eqn  (2.2) 
also  gives  us  a  somewhat  more  pessimistic  idea  of  the  asymptotic  behavior 
than  (1.3)  does. 


3 .  The  Theory  of  Large  Deviations 

As  mentioned  in  the  previous  section,  ’central  limit’  type  ideas  cannot 
be  used  to  obtain  estimates  such  as  (1.3).  The  theory  of  large  deviations 
is  set  up  for  just  this  purpose.  It  has  proven  to  be  a  rather  powerful  tool 
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for  handling  realted  problems  in  probability  and  statistics  [9  1.  Our  basic 
background  ideas  come  from  Freidlin  [5],  although  they  must  be  modified  to 
suit  our  needs.  Freidlin  obtains  large  deviations  estimates  related  to  (1.3) 
for  the  system  x*‘  =  b(x^i(t/e)),  x"£Rr,  where  b ( •  ,  £ )  is  uniformly  Lipsehitz 

and  bounded  and  £>  ( •  )  is  a  bounded  stochastic  process.  We  start  by  re¬ 
capitulating  the  main  ideas,  and  then  adjusting  them  to  suit  our  needs. 

Suppose  that  there  is  a  function  H(*  ,*)  such  that  for  each  x  and 

piecewise  constant  function  a(*)»  the  limit  in  (3.1)  exists. 

fT  fT/C 

(3.1)  H(x,0t(u)  )du  =  lim  €  log  E  exp  a’  (Cu)b (x (u)  ) du . 

JO  r  J  0 

(An  example  will  be  given  before  the  lemma  below) .  Define  the  dual  functional 
(called  the  Cramer  or  Legendre  transform) 

L(x,$)  =  sup  [cxfg  -  H(x,a)]. 

6 

For  <$>(')  absolutely  continuous,  define  S  (T ,  cf>)  by 

fT 

S(T,4>)  =  L(<J>(u)  ,?(u))du, 

J0 

and  set  S(T,4>)  equal  to  00  if  '?(*)  is  not  absolutely  continuous. 

Let  A  c  C  [0 ,T ]  and  let  A°  and  A  denote  the  interior  and  closure 

of  A,  resp. 

Assume 

(A3. 1 )  H ( * , * )  i *  continuous  and  H(x* • )  is  continuously  differentiable 
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for  each  x  (this  will  be  true  for  our  problem). 

r  i  £ 

Let  P^i  ;  denote  expectation  conditioned  on  x  (0)  =  x.  Then  by 
[5],  Theorem  2.1,  we  have  the  large  deviations  estimate  (3.2). 

(3.2)  -inf  S(T )  <.  lim  Llog  P  {x  (*)€A}  lim  log  P  (x  (*)^A; 

Ha''  t  x  t  x 

<  -inf  S(T,?) . 
y  €A 

Thus,  obtaining  the  estimates  requires  solving  a  variational  problem. 

For  the  SA  problems  of  interest,  a  sequence  -*•  0  replaces  £  *♦  0. 

Also  L(x,0)  can  be  written  explicitly,  and  the  variational  problem  is 
equivalent  to  an  optional  control  problem  (see  Section  3). 

Example .  Let  b(x,i)  *  b(x)  +  £,  where  £(•)  is  mean  zero,  stationary 
and  Gaussian  with  an  integrable  correlation  function.  Define 


r00 


R  =  ! 

Ei  (uK 

Loo 

(0)  du.  If  V 

is 

scalar  valued. 

Gaussian  and  E  V  =  0  , 

then 

E  exp  ^  * 

2 

exp  E  V  /2 .  Let 

a(- 

)  be  piecewise 

constant  on  [0,T].  Then 

1 

rT  i 

fT 

rT  - 

H(x,a(u))du  =  j 

1  ^ 

(u) R  a  (u)du  + 

i  (u)b(x)du. 

J 

0  1 

'0 

0 

Thus 

H(x,ot)  - 

<x  Ra/2  +  a  !i(x). 

Freidlin’s  proof  can  be  modified  to  suit  our  needs.  Since  his  proof  is 
not  short,  but  the  modifications  few,  we  only  indicate  the  required  modifica¬ 


tions.  This  will  be  done  in  the  next  section.  But,  to  get  a  better  idea  of 
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what  is  needed,  we  backtrack  and  briefly  discuss  Freidlin’s  technique  of  proof. 

He  used  the  following  result  (of  Gartner  [10],  Leiranas  1.1  and  1.2)  concerning 

large  deviations  estimates  for  a  sequence  of  random  vectors,  in  order  to  first 

obtain  a  ’finite  dimensional1  form  of  (3.2).  Then  via  a  sequence  of  bounds 

and  approximations,  he  takes  the  ’finite  dimensional1  result  into  (3.2).  Let 

d(x,y)  denote  either  the  Euclidean  distance,  or  the  norm  sup | x( t) -y (t ) | 

t<T 


if  x  and  y  are  functions. 


£  k 

Lemma  1 .  (|1(?,  Lemma  1.1  and  1.2)  Let  {r\  }  denote  a  sequence  of  R- 

valued  random  vectors  and  let  there  be  a  sequence  of  positive  numbers 

Sc  0  such  that  the  limit  H^(a)  exists  for  each  a  €  R  : 


Hq(:x)  =  lim  6^  log  E  exp 


Let  Hq(*)  be  continuously  differentiable.  Define  the  dual  function 

Lq(?)  =  sup[o’  £  -  Hq  (-*)].  Define  $q(s)  ~  *  Then  for  each 

oc 

vector  3 ,  and  each  s  _>  0 ,  h  >  0  and  c  >  0 ,  there  is  an  £  >  0 

such  that  for  £  <_  £  . 


(3.3a) 


\  p^d(r>  »*q(s))  >  c}  £  -(s-h) 


(3.3b) 


Ior  PfdCn^.fi)  <  r}  >  -(Lq  ( (■>)  +  h) 
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Let  3  €  R  .  Then  from  (3,3),  we  readily  obtain  (3.4)  (which  is  the 
finite  dimensional  version  of  (3,2)) 


(3.4)  “inf  Lq(6)  lim  6^  log  P{ rj  €  B}  <_  lim  6  log  P{ €  B}  <  -inf  L  (3) 
3€B°  t  .  c  -  0 


The  derivation  of  (3.4)  from  (3.3)  is  quite  straightforward  and  goes  roughly 

as  follows.  Let  £  £  ,  and  define  N  (3)  =  c-neighborhood  of  3. 

Choose  c  such  that  N  ( 3)  £  B^.  Fix  small  h  *>  0.  Then  bv  (3.3h) 

c 


lo£  P1  >)  €B}  _>  6^  log  P{d(rftb) 


^  c:}  >  -(LQ(S)+h/2)  , 


for  small  e.  Now  choose  c  and  such  that  the  right  side  is  within  li  of 

-infLn(B).  Owing  to  the  arbitrariness  of  h,  this  yields  the  left  side 

B€B  U 

of  (3.4).  Next,  for  any  s  such  that  the  (compact)  set  Cq(s)  *s  disjoint 
(distance  >  c  >  0)  from  B,  we  have 


6e  lo«  p(nLe  b}  <  6^  log  p{d(nt',$0(s))  >  c}. 


Now  use  the  (3. 3o)  and  the  largest  possible  {  (s)  (this  requires  that  s<  inf  L  (3 ) ) 

11  cB 


The  details  of  obtaining  (3.4)  from  (3.3)  are  readily  completed. 
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A  rough  outline  of  the  argument  of  Freidlins  Theorem  2.1  [5] . 

Now  that  we  have  the  basic  lemma  used  by  Freidlin,  we  comment  on  his  deriva¬ 
tion  of  (3.2),  In  the  next  section  the  proof  is  extended  to  cover  the  SA  case. 
Starting  with  the  above  Lemma  1,  Freidlin  proved  (3.2)  by  an  argument  along 
the  following  lines.  Fix  x  and  A  >  0  and  let  N  =  T/A  be  an  Integer. 
Let  <M*)  denote  a  function  that  is  constant  on  each  interval  [iA,iA  +  A). 
Define  the  function  xv*r^(*)  by 

(3.5)  x^U)  -  x  +  [  bOMs),£(s/e))ds,  x^,t(t)eRr. 

Jo 

Let  $(*)  denote  a  continuous  function  and  let  denote  the  vector 

($(iA),  i  <  N}.  Define  the  vector  x^  *  =  ( x'^  *  (iA)  ,  i  <  N} .  Define  the 

A 

functional  and  set,  resp., 

r  T 

s  (T,4>)  =  L(<P(s)  ,<P(s))ds  ,  $V(s)  =  {$(•)  :v>(0)  =  x,  S^(T,>J>)  <  s}. 

J0  ~ 

Now,  using  the  fact  that  the  limit  in  (3.1)  exists.  Lemma  1  can  be  applied  to 
the  vectors  r\  =  x^’  ,  3  *  and  with  6^  =  G.  To  see  this  and  to 

see  how  to  obtain  the  Hq(«)  and  ( • )  used  in  Lemma  1,  for  a  set  of  r-vectors 
{$0  f  let  a(*)  in  (3.1)  take  value  (  +  ...  on  [  0  ,A)  , 

(Clj  +  .  .  .  +  j )  ~  on  |  A,2A),  ...  ,  anti  .  fr  ^  ^  |  \'  V. ,  *  | .  Drljur 
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Thus,  since  the  limit  (3.1)  exists,  HQ(Oi)  is  well  defined  for  each  J.  , 
and  so  is  Lq((3). 

Applying  Lemma  1  in  this  way  yields  a  large  deviation  estimate  of  the 
type  (3.3)  for  the  'samples'  of  x^'  (•)  and  <+>(•),  with  sampling  interval  A. 

Via  a  sequence  of  approximations  based  on  this  A-approximation,  Freidlin 
proves  the  analog  (3.6)  of  (3.3)  for  the  sequence^  lx* ,C ( • ) ,£ >  0} ;  namely  that  for 
each  fixed  ')j(-)  and  each  $(•),  s>G,h>0  and  c>0,  there  is  an 

>  0  such  that  for  e  < 


(3.6) 


C  log  P{d(x+)  ’  (s)  )  >  c)  <-(s-h). 


t  log  P{d(x^’t‘,^)  <  c  }  >■  -(S^(T,>f)  +h) 

Inequality  (3.2)  (with  sV,<»  replacing  S(T,<*>))  for  ix^’C(.)}  follows  from  (3.6), 
just  as  (3.4)  followed  from  (3.3)  [5,  Lemma  3.1].  The  sequence  of  approximat Ions 
alluded  to  above  use  the  fact  that  the  behavior  of 
I’or  the  idcnl  i  t  i  <  i  n  «»(  our  fernr.  with  l  hose  in  |  '»  |  ,  our  (x‘*  , 

(X  l  r  %  Sj'l  T(;  )  ,a)  in  |  ’»  |  . 
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^  £ 

$(*)  and  x  •  (•)  between  the  A-sample  points  is 'regular 1  enough  so  that 
if  the  large  deviations  estimates  hold  for  the  samples  for  small  enough 
A  >  0,  then  (3.6)  holds.  These  approximations  depend  heavily  on  the  Lipschitz 
continuity  and  boundedness  of  b(-,£)  in  order  to  show  that  the  path 
excursions  between  the  iA-sampling  times  can  be  made  as  small  as  desired 
by  making  A  small  enough.  Freidlin  then  proved  (3.2)  by  using  (3.6) 

and  a  sequence  of  approximations  with  suitably  chosen  v(*)  and  4>(‘)* 

These  approximations  also  use  the  boundedness  of  b(*,*)  and  the  Lipschitz 

£ 

condition  to  show  that  the  excursions  of  x  (*)  between  the  iA-sampling 
points  can  be  made  (uniformly)  as  small  as  desired,  by  making  A  small 
enough.  We  use  these  comments  in  the  next  section.  Next,  we  obtain  an 
estimate  which  will  be  needed  to  extend  the  result  to  the  SA  case. 


A  bound  on  the  sample  excursions  and  sums  of  the  noise  terms  for  (1.1) 

and  (1.2),  Since  £  are  not  bounded  in  the  SA  case,  we  need  an  estimate 
-  n 

of  the  excursions  of  the  paths  of  (the  SA  interpolations)  xn(*)  between 
the  iA-sampling  points,  when  xn(t) €  G  for  t€  [0,T],  This  is  provided 
by  the  following  theorem. 


Theorem  2.  Let  {£  }  be  mutually  independent,  mean  zero  and  Gaussian 

n  m(t  +  T) 

2  n  2 

with  var  £  <  o  <  <»  .  Define  X  =  £  af  .  For  each  a  >  0  and 

-  n  —  -  n  1 -  - 

n  J 


there  is  a  A^  >  0  such  that  for  A  <  A^ 


M  <  °° 
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m(t  +  iA  +  s) 
n 


(3.7)  PA  ~  P  {  sup  sup  |  £  a.£.  |  >  a}  _<  exp-M/A 

iA<T  s^A  m(t  +  iA)  J  J  n 

-  n 


m(tn  +  T) 


2  2 

If  a.  is  replaced  by  a./c.  and  A  =  J  a./c.,  then  (3.7)  still 
—  2  - 1 - -  JJ  n  £  3  3  -  - 

holds  if  p  - 2y  >  0 . 


Proof .  We  do  only  the  first  case.  The  second  is  treated  in  the  same  way. 

Suppose  that  the  £  are  scalar  valued;  otherwise  work  with  one  component 

at  a  time.  For  any  Y  >  0,  Chebychev 1 s  inequality  and  the  Gaussian  property 

2  2 

yields  (using  E  exp  a  £  <■_  exp  a  a  12) 


m(t  +iA+s) 
n 


m(t  +iA+s) 
n 


exp  Y  a} 


P{sup  \  a.^.  >_a}  =  P{sup|expY  I  aA.p 

s<A  m(tn+iA)  3  J  s^A L  m(tn+iA) 


m(tn+iA+A) 

<  (exp  -Ya^xpPy^o^  I  a^/2| 

L  m(t  +iA)  3  J 

n 

r„  2  m(tn+,iA+A)  21  -1 
f  exp-  \2l  l  a  , 

m(tn+iA)  3 


where  the  last  inequality  is  obtained  by  minimizing  the  next  to  last  term  over 
Y  >  0.  Repeating  for  replacing  »  we  get 


(3.8) 


_<2  £  exp-C/(X  C.^)  ,  C  =  a2/202  , 

“  iA<T 


n  in 

m(t  +1A+A) 
n  7 

C.  =  l  a TA  • 
ln  /  ,  ^  N  3  n 

m(tn  ^lA) 


Now,  letting  A  be  small  enough  yields  the  Theorem,  since  lim  suj^C.^  =  0. 


Q-E.D. 
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4.  Large  Deviations  Estimates  for  (1.1)  and  (1.2). 

The  key  to  the  extension  of  Freidlins  theorem  to  the  SA  case  is  in  getting 

the  proper  norming  sequence  {A^}  and  the  proper  analog  of  the  H-functional 

introduced  in  (3.1).  Our  guide  is  the  method  of  proof,  via  Lemma  1. 

The  form  of  H(-,*)  and  {A^}  for  the  systems  (1.1)  and  (1.2).  Let 

M-)  and  ^(')  denote  Revalued  functions  which  are  constant  on  the 

intervals  [iA,iA+A)  ,  and  where  !;'(t)|  is  bounded  by  supjx!.  For  each 

xfcG 

xcG  and  each  n  define  the  piecewise  linear  function  xv’n(*)  as  follows. 
The  break  points  are 

^0,tn  +  l’tn,t’n+ 2~tn’  *  *  *  ^  and  for  J  >  n»  set 

k-1  j-1 

(4-1)  x^’n(t  )  =  X  +  l  a  bCKt.-t „))  +  I  a,£  , 

k  n  3  J  n  „ 

for  the  case  (1.1).  For  the  case  (1.2),  replace  .  by  £./c.  The 

J  J  J  • 

^  n  ^  e 

x  9  (•)  replace  the  x  *  (•)  of  (3.5),  and  they  have  the  same  interpolation 
intervals  as  do  the  xn(*).  Until  further  notice,  work  with  case  (1.1). 

A  natural  analog  of  the  H-functional  of  (3.1)  is  that  defined  by  the 
limit  in  (4.2)  (if  it  exists,  for  a  suitable  normalizing  sequence  {A^}) . 
Recall  that  T  *  NA. 


(4.2) 


fT 

H(x,u(s) ,s)ds 

J0 


N_1  m(t  +1A4A)-1 

limXn  l°gE  exp  l  ot'(iA)  £  a  (b(x)  + 1,  )/A  . 

i=0  j  =  m(t  +  i&)  J  J  n 

J  n 
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Since 


(4.3) 


N-l  m(tn+iA+A)-l 

A  log  E  exp  l  a'(iA)  £  a  L 

i=0  j  =m(t  +1A)  J  3 

J  n 


X  N-l 


m(tn+  iA+A)-l 


=  I  a’  (iA)R a  (iA)  £  a2/X2  » 


1=0  ]  =  m(t  -l-  i  A  ) 

J  n 


j  n 


a  natural  candidate  for  X  (and  the  one  we  use)  is 

n 


(4. A) 


m(t  +T) 
n  « 

X  =  y  a. 

n  £  J 


By  the  same  reasoning,  for  case  (1.2),  the  natural  candidate  for  X  is 

n 


(4.4’) 


m(t  +T) 

n  ,2 
X  =  l  a2/c 
n  £  J  J 


In  order  to  get  the  correct  form  of  H(x,a,s) ,  we  need  to  check  the  limit  of 
(4.3)  as  n  If  there  is  a  function  h(-)  such  that 


(4.5) 


(4.3) 


(T  a>  (s)Ra(s)  h(s)ds,  as  n  -►  00  , 
2 


then 


(4.6) 


H(x,a,s)  *  otfRah(s)/2  +  a’b(x). 


From  the  results  of  Section  6,  we  get  that  the  limit  exists  and  that 


h(s)  *  1/T  when  P  <  1  and  h(s)  -  (1-e  )  c  8 


(>  =* 


whf*n 


1  .  Not*.*  tin* 
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peculiarity  that  the  H-functional  depends  on  s  as  well  as  on  a  and 

x,  when  p  =  1.  For  (1.2),  one  gets  h(s)  =  1/T  when  p  <  1  and 

h(s)  -  (l-2y)  e  (1-e  N  )  when  P  =  1. 


The  next  Theorem  yields  an  analog  to  (3.2)  for  the  SA  case.  The  speciali¬ 
zation  to  the  escape  time  problem  appears  in  the  next  section. 

Theorem  3.  If  b ( * )  is  Lipschitz  continuous  and  «/  }  is  i . i . d . 
Gaussian  with  zero  mean,  then  for  each  set  A  cr  C^[0,T], 


(4.7) 


-inf^S(T,c)  lim  A^  log  Pv{xn(*)£A}  lim  \  log  Pxixn(*)€A) 


n  ”  x 


where 


i  “inf  S(T,«), 

<pe  a 


S(T,$)  =  |  L((f(s)  ,  P(s)  ,s)ds  , 

j0 


(4.8a) 


L(x,8,s)  «  sup  [a'8^H(x,a,s)] 


In  particular  H(* ,  •  ,*)  is  given  by  (4.6)  and  if  R  is  positive  definite,  then 


(4.8b)  L(x,P,s)  =  h  1  (s)  (8 -b(x) )  'r_1  (8 -b(x) )  /  2  =  LQ  (x,6)h_1(s). 


Later,  we  treat  the  degenerate  case  where  R 


•f° 0 1 

L°  R22  J  ’ 


and  where  R0^  is 


positive  definite.  For  this  case,  define  R  by  R 


1  TU  0 

a 

0 
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Comments  on  the  proof.  In  Section  6,  we  show  that  the  limits 

(as  n  ^  )  of  (4.3)  exist  as  stated  for  the  given  h(«)*  We  now  discuss 
(4.7).  Let  <£(*)  be  a  continuous  function.  By  the  smoothness  of  H(  *,*,*), 
Gartners  result,  Lemma  1,  can  be  applied  to  the  vectors  x^*n  =  {xY ’ n( iA) , iA  <  T  *, 

S  -  ®  (4>(iA),i  <  \T),  just  as  it  was  applied  to  “  and  i ,  below 

(3.5).  From  here  on  one  follows  Freidlin  [5],  Theorem  2.1, almost  word  for 
word.  Only  the  differences  will  be  noted  here. 

First,  one  proves  the  analog  of  (3.6)  for  A^  replacing  C  and  x",n(‘) 
replacing  x^,e(*)*  Let  (Or  denote  citations  to  Freidlin 

r 

[5].  His  proof  uses  auxiliary  (Lemmas  3.1  and  3.2^  #  Inequality  (3.6)  is 

F 

derived  in  (Lemma  3.1)  .  The  proof  of  this  lemma  carries  over,  except  for  the 

F 

inequality  3  lines  below  (3.2)  and  the  set  inclusion  3  lines  below  (3.5)r. 

r  r 

But,  by  our  Theorem  2,  the  inequality  below  (3.2)  can  be  replaced  by  the 

r 

*t* 

following  (in  our  terminology  ):  For  each  n  >  0,  c  >  0  and  M  <  oc,  there 
are  A^  >  0 ,  <  «>  and  Cq  >  0  such  that  for  A  A^  and  n  >  n^, 

P{d(x^,n,0<  c}  >  P{d(x^’n,4?A)  <  cQ} 

-  P{  sup  |x^,n(iA+s)-x^’n(ii)  |  >M), 
iA<T,s 

where  the  last  probability  on  the  r.h.s.  is  <  exp  -M/A^ .  The  proof  uses  the 

finiteness  of  sup|b(x)|  and  Theorem  2.  Similarly,  the  set  inclusion 

x£G 

^Friedlins  symbols  (M^  T  ,n  ^  , 6 1  ^  ^  ,t;)  are  replaced  by  our 

(d,x^’n,c,crt,x^,n,<t)A  ,A  ).  Also  his  p  is  simply  the  Euclidean  distance  d. 

A  0  A  n 
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below  (3.5)  holds  modulo  a  set  whose  probability  is  exp  -M/X  for  small 
r  —  n 

enough  A,  and  large  enough  n.  Lemma  (3.2)  carries  over  with  no  change. 

r 

The  counterpart  to  our  xv,n(-)  in  Freidlins  proof  is  his  xt,’r(*)  (called 
x",C(-)  in  (3.5) . 

Only  one  change  is  required  in  the  main  part  of  the  proof  of  (Theorem  2.1)^. 
In  the  paragraph  below  (3.13)  t  the  fact  that  (x£,^(-);  C  >  0,  t  <  T, 

I  v(t)  j  £  B  =  Slip  J  x |  }  =  Qn  and  {x£(*);  C  >  0,  t  <  T}  =  Q  belong  to  a  compact 

x€C  U  1 

set  in  C^[0,T]  is  used.  (xC,r  is  the  x^’e  in  (3.5)).  This  compactness  is 

used  to  guarantee  that  for  each  5  >  0,  there  is  an  N~  <  00  and  functions 

6 

(not  depending  on  e  or  9(*))  such  that  the  union  of  the 
6 -neighborhoods  of  the  (p ^  cover  U  .  Our  trajectories 

{x^  ’ n  (  •  )  , x11  (  •  )  ,  t  £  T,  C  >  0,  |ty(t)|  £  B}  *  do  not  belong  to  a  compact 

set,  since  the  Gaussian  noise  is  unbounded.  But,  by  Theorem  2  we  obtain  the 

following:  For  each  M  <  00  and  5  >  0  there  ar^  Nr  <  00  and  6, 

0  X 

such  that  the  union  of  the  6-neighborhoods  of  the  9^  covers  except  for 

a  set  of  paths  whose  probability  is  <  k  exp-M/X  ,  where  k  does  not  depend 

—  n 

on  n.  Since  number  M  in  the  estimates  of  the  probabilities  of  the  exceptional 
sets  can  be  made  arbitrarily  large,  we  can  carry  through  all  the  details 
(essentially  as  done  in  [5])  to  obtain  (4.7).  The  inequalities  (4.7)  are 


specialized  to  the  escape  time  problem  in  the  next  section. 


-20  a- 


I- 


5 .  Escape  Time  Formulas 

Let  A  =  {$(•): 4>(0)  =  x,  4>(t)£G,  some  t  <  T}.  Then  by  (4.7)  and  the 


fact  that  A  is  closed, 


(5.1)  -infS  (T,$)  <  lim  X  log  P  {tp  <  T}  < 

—  —  ~  n  X(j  —  — 


<p£AL 


lim  X  log  P  { T_,  <  T}  <  -inf  S  (T,$) 
a  n  x  Lj  —  — 


<p€A 


In  a  sense,  for  ’almost  all  G1,  there  is  equality  in  (5.1).  To  see  this,  define 


S  ”  (y:d(y,G)  <  5  }  ,  and  set  =  {$(.):  <J>(0)  =  x,<f>(t)£  3  G^  for  some  t  <_  T),  and  Ag  =  A. 


Define  S  (G^)  =  inf  S  (T,4>).  As  S  ±  0,  S  (G^)  decreases  and  it  is  con- 

x  0  <}>eA,  x  u 

c 

tinuous  at  all  6  >  0  except  for  a  countable  number.  Assume 


(A5.1)  S  (G  )  4r  S  (G)  as  <S  \  0. 

X  0  A  - 


The  condition  always  holds  in  the  non-degenerate  case  described  below. 

Theorem  A.  Under  (A5.1)  and  the  given  properties  of  and  b(-), 


equality  holds  in  (5.1). 

The  proof  follows  from  the  facts  that  (a):  lim  5  (G.  )  =  infS(T,y),  and 

5  x  6  <J£A° 

(b):by ;A5. 1)  ,  limS  (G? )  =  inf  S  (T,y) . 

5  x  - 


Condition  (A5.1)  also  implies  that  S^(G)  is  continuous  at  x  *  0,  if  it 
holds  at  x  =  6.  Thus,  it  is  not  much  of  a  restriction  to  assume  equality  in 
(5.1)  and  to  let  0  *  x,  as  we  do  henceforth.  Now,  consider  the  variational  problem 


of  getting  the  inf  S  (T,<J)).  If  R  >  0,  we  say  that  the  problem  is 

<fr€A 

nondegenerate.  If  R  is  singular ,  suppose  for  convenience  that  R  takes 
the  form  R  =  | ^  I  *  where  R  0  >  0  and  partition  the  vectors  as  follows: 

L°  R22j  22 

x  *  (x1,x2),  a  *  (a^c^),  8  =  <81#02).  *  (<J>lt<f>2),  where  x2,a2>82  and  <J>2 

have  the  dimension  of  R^.  As  notec*  below,  in  the  non-degenerate  case,  we  have 


4 

I 
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L(x,£,s)  =  T(3-b(x))'R  1(3-b(x))/2  =  LQ(x,3)T,  p  <  1, 

(5.2) 

L(x,3,s)  =  ( l-e-T)  (3  -b(x) )  '  R_1  (3  -b  (x)  )  eS/2 
=  LQ(x,3)  eS(l-e~T)  ,  p  =  1. 

—  1/2 

Define  u  =  R  (3-b(x)).  Then  the  variational  problem  of  calculating 

inf  S(T,'$)  is  equivalent  to  minimizing 

<t>€A 

t  rT  i  _T  rT  s 

(5.3)  S(T,$)  =  J  u'u  ds  or  — )  e  u'u  ds, 

•>0  2  J0 

(depending  on  whether  P  <  l  or  1  "  ^  subject  to 

■  U 

(5.4)  f  =  b(<p)+R2u,  <P(0)  =  x,  4>(t)€3C  for  some  t  <  T. 

In  the  degenerate  case,  the  variational  problem  is  (5.5'),  (5.4'). 


(5.3') 


S(T,$) 


C  V, 


e  u2u2ds. 


subject  to 

(5.4’) 


^1  =  bl^ 

P2  =  b2(<J>)  +  ^22 2  u2  ’  ^(0)  =  X  ’  €  9G  for  some  c  1T* 


6.  The  Values  of  X  And  H(* »•*•)• 

- — -  n  - 

In  this  section,  we  evaluate  A  and  obtain  the  h(*)  in  (4.6)  and 
and  the  V  of  (1.3).  Until  further  notice,  we  work  with  case  (1.1).  By 
the  discussion  leading  to  (4.6),  to  obtain  H(  *,*,•)  wo  need  only  find 


h(-)  such  that 


m(tn  +  t  +  A) 


(6.1) 


y  af/X  ->  h(t) A  +  o(A) 

^  t  n 


m(t  +  t) 
n 


We  use  integrals  in  place  of  sums  henceforth,  since  the  ratios  of  the  integrals 

to  the  sums  converge  to  unity  in  all  cases,  as  n  -*•  03 .  Also  ~  means 

that  a  /ft  -*■  1  as  n  -*■  By  the  definition  of  m(t  +  t)  ,  for  t  >  0, 

n  n  n  — 


m(t  +  t)-l 
n 


I  a. |  <  a  . 

i  y  ~  n 


6.1  The  case  0  =  1  and  (1.1).  By  definition,  m(tn+t)  and  X^ 


satisfy 


m(t  +T) 


m(t  +T) 


s-1  ds  ,  Xn 


-2  . 
s  ds 


Thus  m(tn+T)  -  ne  and 


(6.2  ) 


Xn  -  n_1(l-e  T). 


Now,  in  order  to  evaluate  (6.1).  We  need  only  evaluate  (6.3)  for  p  =  1 


a(tn+t+  A) 


(6.3) 


s”  /Xn  =  h ( t )  A  +  o  (A) 


m(tn*ft) 


-T  ^  —  t  -A  ~T  ^  —  t 

Since  this  equals  (1-e  )  e  (1-e  ) ,  we  have  h(t)  =  (1-e  )  e 


6.2  Th;  case  p  <  1  and  (1.1).  Here 


m(t  +T)  m(t  +T) 

'  n  -p  [  n  -2p 

s  ns  ,  X  s  ds 

*  n 

n  'n 


We  have  m(tn  +  T)1_P  =  n1_P  +  (l-P)T  or  m(tn+T)  -  nll  +  T/n1  P] ,  and 
P  -P 

m(t^  +  T)-n  ~  Tn  .  This  yields  A^  ~  Tn  .  Now,  evaluating  (6.3)  yields 

(6.3)  =  A/T  +  o(A).  Thus  h(t)  =  1/T. 

6.3  The  case  P  =  1  and  (1.2),  with  1  >  2Y.  Here  A  is  defined 

-  -  -  n 

by  (4.4*).  Thus  A  -  (1-2Y)  \l-e  ^^).  To  obtain  the  proper 

n 

weighing  function  h(*),  we  need  to  evaluate 


(6.4) 


m(t  +t+A) 
t  n 


m(t  +  t) 
n 


s-2P+2Ydg/x  =  h(t)A  +  o(A) 


The  left  side  of  (6.4)  is  asymptotically  equivalent  to 


A(l-2Y)e"t(1_2Y)(l-e  tO-2Y)^-1  +  o  (A)  =  Ah(t)  +  0  (A)  . 


6.4  The  case  P  <  1  and  (1.2)  with  P  >  2Y.  Here  A  -  Tn 


-P  +  2Y  and 


h(t)  *  1/T. 


6.3  The  Lagrangian  L(*,*,')  and  V  £f  (1.3).  Since  L(*,-,*)  and 

A  have  common  factors,  we  define  some  new  terms  in  order  to  obtain  the 
n  * 

simplest  form  of  the  asymptotic  estimates.  Define  the  set  (replaces  A  in  the 
degenerate  case) 


Ap  -  {<J> : 4) (0)  -  x,  ^  -  b1(4>),  4>(t)€SG,  some  t  £  T}, 


For  (1.1)  the  L( •,*,•)  are  given  by  (5.2).  Define 


I 


■  v V.is&as-- 


S  =  inf  Ln(<J>(s)  ,4>(s))e  ds 
1>€A  Jo  U 


S  =  inf  L-OMs)  ,<P(s))ds 
9£A  J0  U 


S  ?  =  inf  L  ((f  (s)  ,<j>  (s)  )eSds 

A/'  i  /\  ” 


feAp  jo 


S  =  inf  Ln(>f  (s)  ,<P  (s))ds. 

»£Vo 


For  the  Kiefer-Wolfowitz  procedure  (1.2),  we  need  the  definition 


S  (kw)  =  inf  L  (4>  (s)  ,<P  (s))e 


(1-2Y) 


ifine  S  ^  (kw)  similarly,  with  A^  replacing  A. 


For  (1.1),  we  finally  obtain  for  the  non-degenerate  case 


lim  n~l  log  P  (ij!  <  T  }  =  -  S,  (for  p  =  1) 
n  x  0  —  1 


(6.5) 


lim  n_p  log  P  {x"  <  T}  =  -S.  (for  P  <  1) 
n  x  Lj  —  u 


For  the  degenerate  case,  replace  and  by  and  ,  resp.  ,  in  (6.5) 

_ ^ 

All  S  ^  are  continuous  in  x  in  a  no Ighborhood  of  {) .  For  (1.2),  we  have 


for  the  non-degenerate  case 


with  the  obvious  alterations  of  the  right  side  for  the  degenerate  case. 

Note  that  >  S^.  This,  together  with  the  relationship  n  > n  tor 

implies  the  ’considerable’  superiority  of  the  coefficient  sequence  a  =  n 
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